L09 Themes

Data Visualization (STAT 302)

Author

Sherry Chen

Published

August 7, 2024

Overview

The goal of this lab is to play around with the theme options in ggplot2.

Datasets

We’ll be using the cdc.txt and NU_admission_data.csv datasets.

# load package(s)
library(tidyverse)
library(ggthemes)
library(ggplot2)
library(patchwork)
library(cowplot)
library(showtext)
library(ThemePark)
library(sysfonts)
library(dplyr)

# Enable showtext
showtext_auto()

font_add("ArialBold", "~/System/Library/Fonts/Supplemental/Arial Bold.ttf")

#font_add("CamptonBlack", "~/font/CamptonBlack.otf")

# read in the cdc dataset
cdc <- read_delim(file = "data/cdc.txt", delim = "|") |>
  mutate(
    genhlth = factor(
      genhlth,
      levels = c("excellent", "very good", "good", "fair", "poor"),
      labels = c("Excellent", "Very Good", "Good", "Fair", "Poor")
      )
    )

# read in NU admission data
nu_admission_data <- read_csv("data/NU_admission_data.csv") |> 
  janitor::clean_names()

# set seed
set.seed(2468)

# selecting a random subset of size 100
cdc_small <- cdc |> 
  slice_sample(n = 100)

Exercise 1

Use the cdc_small dataset to explore several pre-set ggthemes. The code below constructs the familiar scatterplot of weight by height and stores it in plot_01. Display plot_01 to observe the default theme. Explore/apply, and display at least 7 other pre-set themes from the ggplot2 or ggthemes package. Don’t worry about making adjustments to the figures under the new themes. Just get a sense of what the themes are doing to the original figure plot_01.

There should be at least 8 plots for this task, plot_01 is pictured below. Use patchwork or cowplot in combination with R yaml chunk options fig-height and fig-width (out-width and fig-align may be useful as well) to setup the 8 plots together in a user friendly arrangement.

Code
# plot
plot_01 <- ggplot(
  data = cdc_small,
  aes(x = height, y = weight)
) +
  geom_point(size = 3, aes(shape = genhlth, color = genhlth)) +
  scale_y_continuous(
    name = "Weight in Pounds",
    limits = c(100, 275),
    breaks = seq(100, 275, 25),
    trans = "log10",
    labels = scales::label_number(
      accuracy = 1,
      suffix = " lbs"
    )
  ) +
  scale_x_continuous(
    name = "Height in Inches",
    limits = c(60, 80),
    breaks = seq(60, 80, 5),
    labels = scales::label_number(accuracy = 1, suffix = " in")
  ) +
  scale_shape_manual(
    name = "Health?",
    labels = c(
      "Excellent", "Very Good",
      "Good", "Fair", "Poor"
    ),
    values = c(17, 19, 15, 9, 4)
  ) +
  scale_color_brewer(
    name = "Health?",
    labels = c(
      "Excellent", "Very Good",
      "Good", "Fair", "Poor"
    ),
    palette = "Set1"
  ) +
  theme(
    legend.position = "inside",
    legend.position.inside = c(1, 0),
    legend.justification = c(1, 0)
  ) +
  labs(title = "CDC BRFSS: Weight by Height")

plot_01

Solution
Code
#comparing 8 themes
plot_01
plot_01 + theme_classic()
plot_01 + theme_economist()
plot_01 + theme_fivethirtyeight()
plot_01 + theme_half_open()
plot_01 + theme_wsj()
plot_01 + theme_avatar()
plot_01 + theme_barbie()
plot_01 + theme_tufte()

Important

Which theme or themes do you particularly like? Why?

Solution

My favorite themes are theme_barbie, it has a ton of pink and barbie fonts which is super interesting considering the difference tech coding and barbie.

Exercise 2

Using plot_01 from Exercise 1 and the theme() function, attempt to construct the ugliest plot possible (example pictured below). Be creative! It should NOT look exactly like the example. Since the goal is to understand a variety of adjustments, you should use a minimum of 10 different manual adjustments within theme().

Solution
Code
# plot
plot_ugly <- ggplot(
  data = cdc_small,
  aes(x = height, y = weight)
) +
  geom_point(size = 0, aes(shape = genhlth, color = genhlth)) +
  scale_y_continuous(
    name = "Weight in Pounds",
    limits = c(100, 275),
    breaks = seq(100, 275, 25),
    trans = "log10",
    labels = scales::label_number(
      accuracy = 1,
      suffix = " lbs"
    )
  ) +
  scale_x_continuous(
    name = "Height in Inches",
    limits = c(60, 80),
    breaks = seq(60, 80, 5),
    labels = scales::label_number(accuracy = 10, suffix = " in")
  ) +
  scale_shape_manual(
    name = "Health?",
    labels = c(
      "Excellent", "Very Good",
      "Good", "Fair", "Poor"
    ),
    values = c(17, 19, 15, 9, 4)
  ) +
  scale_color_brewer(
    name = "Health?",
    labels = c(
      "Excellent", "Very Good",
      "Good", "Fair", "Poor"
    ),
    palette = "Set2"
  ) +
 theme_avatar() +
  theme(
    legend.position = c(0.5, 0.5),
    legend.justification = c(0.5, 0.5),
    plot.title = element_text(hjust = 0.5),
    axis.title = element_text(size = 15),
    axis.title.x = element_text(
      size = 15,
      face = "italic",
      color = "hotpink",
      angle = 20
    ),
    axis.text = element_text(size = 10),
    panel.grid.major = element_line(color = "white", size = 0.5),
    panel.grid.minor = element_line(color = "red", size = 0.2),
    axis.ticks.length.x = unit(.2, "in"),
    legend.background = element_rect(fill = "darkgreen", color = "black")
  ) +
  labs(title = "CDC BRFSS: Weight by Height")

plot_ugly

Exercise 3

We will be making use of your code from Exercise 3 on L07 Layers. Using the NU_admission_data.csv you created two separate plots derived from the single plot depicted in undergraduate-admissions-statistics.pdf. Style these plots so they follow a “Northwestern” theme. You are welcome to display the plots separately OR design a layout that displays both together (likely one stacked above the other).

Check out the following webpages to help create your Northwestern theme:

Additional requirement:

Use a free non-standard font from google for the title. Pick one that looks similar to a Northwestern font.

Note

I find this blog post to be extremely useful for adding fonts. Important packages for using non-standard fonts are showtext, extrafont, extrafontdb, and sysfonts. The last 3 generally just need to be installed (not loaded per session).

Solution
Code
#data wrangling (move to tidy format for data)
bar_data <- nu_admission_data |> 
  filter(year > 1999) |> 
  select(-contains("_rate")) |> 
  pivot_longer(
    cols = -year, 
    names_to = "category",
    values_to = "value"
  ) |> 
  mutate(
    bar_label = prettyNum(value, big.mark = ",")
  )

#bar plot 
plot_1_bar <- ggplot(data = bar_data, 
       mapping = aes(x = year, y = value, fill = category)
       ) +
  geom_col(position = "identity", width = 0.75) +
  geom_text(
    mapping = aes(label = bar_label),
    color = "white",
    size = 2,
    vjust = 1,
    nudge_y = -300) +
  scale_x_continuous(
    name = "Entering Year",
    breaks = 2000:2020, #seq(2000,2020),
    expand = c(0,0.25)
) + 
  scale_y_continuous(
    name = "Applications",
    limits = c(0,50000),
    breaks = seq(0,50000,5000),
    labels = scales::label_comma(),
    expand = c(0,0)
    )+
  theme_classic() +
  theme(
    legend.justification = c(0.5,1),
    legend.position = c(0.5, 1),
    legend.title = element_text(
      family = "ArialBold",
      color = "#4E2A84",
      size = 10),
    legend.text = element_text(
      family = "ArialBold",
      color = "#4E2A84",
      size = 10
    ),
    legend.direction = "horizontal" ,
    plot.title = element_text(
      hjust = 0.5,
      family = "ArialBold",
      face = "bold",
      size = 20,
      color = "#4E2A84"
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      family = "ArialBold",
      face = "bold",
      size = 15,
      color = "#4E2A84"
    ),
     axis.title.x = element_text(
      family = "ArialBold",
      color = "#4E2A84",
      face = "bold",
      size = 10
    ),
    axis.title.y = element_text(
      family = "ArialBold",
      color = "#4E2A84",
      face = "bold",
      size = 10
    )
  ) +
  scale_fill_manual(
    name = NULL, 
    limits = c("matriculants", "admitted_students", "applications"),
    labels = c("Matriculants", "Admitted_students", "Applications"),
    values = c("#4E2A84","#836EAA", "#B6ACD1")
  ) +
  labs(
    title = "Northwestern University",
    subtitle = "Undergraduate Admissions 2000-2020"
  )
Code
#data wrangling (move to tidy format for data)
rate_data <- nu_admission_data  |> 
  filter(year > 1999) |> 
  select(year, contains("_rate")) |> 
  pivot_longer(
    cols = -year, 
    names_to = "rate_type",
    values_to = "value"
  )  |> 
 mutate(
   rate_label = str_c(value, "%"),
   label_y = case_when(
   rate_type == "yield_rate" ~ value + 2,
   rate_type == "admission_rate" ~ value + 2,
   )
 )

#data wrangling (move to tidy format for data)
# Assuming rate_data is your data frame
plot_2_rate <- ggplot(data = rate_data, 
       mapping = aes(x = year, y = value, color = rate_type, shape = rate_type)
       ) +
  geom_line() +
  geom_point(mapping = aes(fill = rate_type), color = "white", size = 3, stroke = 1) +
  geom_text(
    mapping = aes(y = label_y, label = rate_label),
    size = 2,
    show.legend = FALSE
  ) +
  scale_x_continuous(
    name = "Entering Year",
    breaks = 2000:2020,
    expand = c(0,0.25)
  ) + 
  scale_y_continuous(
    name = "Rate",
    limits = c(0,60),
    labels = scales::label_percent(scale = 1),
    expand = c(0,0),
    position = "right"
  ) +
scale_color_manual(
    name = NULL, 
    values = c("admission_rate" = "#8E44AD", "yield_rate" = "#D2B4DE"),
    labels = c("Admission Rate", "Yield Rate")
  ) +
 scale_fill_manual(
    name = NULL, 
    values = c("admission_rate" = "#8E44AD", "yield_rate" = "#D2B4DE"),
    labels = c("Admission Rate", "Yield Rate")
  ) +
  scale_shape_manual(
    name = NULL, 
    limits = c("admission_rate", "yield_rate"),
    labels = c("Admission Rate", "Yield Rate"),
    values = c(21, 24)
  ) +
  theme_classic() +
  labs(
    title = "Northwestern University",
    subtitle = "Undergraduate Admissions 2000-2020"
  ) +
  theme(
    legend.text = element_text(
      family = "ArialBold",
      color = "#4E2A84",
      size = 10
    ),
    legend.justification = c(0.5, 1),
    legend.position = c(0.5, 1),
    legend.direction = "horizontal",
    plot.title = element_text(
      hjust = 0.5,
      family = "ArialBold",
      face = "bold",
      size = 20,
      color = "#4E2A84"
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      family = "ArialBold",
      face = "bold",
      size = 15,
      color = "#4E2A84"
    ),
    axis.title.x = element_text(
      family = "ArialBold",
      color = "#4E2A84",
      face = "bold",
      size = 10
    ),
    axis.title.y = element_text(
      family = "ArialBold",
      color = "#4E2A84",
      face = "bold",
      size = 10
    )
  )


plot_1_bar

Code
plot_2_rate

Challenge

Important

Challenge is optional for all students, but we recommend trying them out!

Using cdc_small dataset, re-create your own version inspired by the plot below.

Must haves:

  • Use two non-standard fonts (one for labeling the point and the other for the axes)
  • Use at least two colors (one for the added point, another for the rest of the points)
  • A curved arrow used to label the point

Using Bilbo Baggins’ responses below to the CDC BRFSS questions, add Bilbo’s data point to a scatterplot of weight by height.

  • genhlth - How would you rate your general health? fair
  • exerany - Have you exercised in the past month? 1=yes
  • hlthplan - Do you have some form of health coverage? 0=no
  • smoke100 - Have you smoked at least 100 cigarettes in your life time? 1=yes
  • height - height in inches: 46
  • weight - weight in pounds: 120
  • wtdesire - weight desired in pounds: 120
  • age - in years: 45
  • gender - m for males and f for females: m
Note

Adding non-standard fonts can be an adventure. I find this blog post to be extremely useful for adding fonts. Important packages for using non-standard fonts are showtext, extrafont, extrafontdb, and sysfonts. The last 3 generally just need to be installed (not loaded per session).

Hint:

  • Create a new dataset (maybe call it bilbo or bilbo_baggins) using either data.frame() (base R - example in book) or tibble() (tidyverse - see help documentation for the function). Make sure to use variable names that exactly match cdc’s variable names. We have provided the tidyverse approach.
  • Search google fonts to find some free fonts to use (can get free fonts from other locations)
Solution
Code
# build dataset for Bilbo Baggins
bilbo <- tibble(
  genhlth  = "fair",
  exerany  = 1,
  hlthplan = 0,
  smoke100 = 1,
  height   = 46,
  weight   = 120,
  wtdesire = 120,
  age      = 45,
  gender   = "m"
)